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ABSTRACT 

Toxin-antitoxin systems are widespread in bacteria 
and archaea. Tliey perform diverse functional roles, 
including the generation of persistence, mainten- 
ance of genetic loci and resistance to bacterio- 
phages through abortive infection. Toxin-antitoxin 
systems have been divided into three types, 
depending on the nature of the interacting macro- 
molecules. The recently discovered Type III toxin- 
antitoxin systems encode protein toxins that are 
inhibited by pseudoknots of antitoxic RNA, 
encoded by short tandem repeats upstream of the 
toxin gene. Recent studies have identified the 
range of Type I and Type II systems within current 
sequence databases. Here, structure-based 
homology searches were combined with iterative 
protein sequence comparisons to obtain a current 
picture of the prevalence of Type III systems. Three 
independent Type III families were identified, ac- 
cording to toxin sequence similarity. The three 
families were found to be far more abundant and 
widespread than previously known, with examples 
throughout the Firmicutes, Fusobacteria and 
Proteobacteria. Functional assays confirmed that 
representatives from all three families act as toxin- 
antitoxin loci within Escherichia coli and at least two 
of the families confer resistance to bacteriophages. 
This study shows that active Type III toxin-antitoxin 
systems are far more diverse than previously 
known, and suggests that more remain to be 
identified. 



INTRODUCTION 

Bacteria are constantly faced with environmental stresses 
and the threats of viral predation. Through adaptive 
evolution they have developed multiple systems to 
ensure survival, including the toxin-antitoxin (TA) 
systems (1^). TA systems are near-ubiquitous throughout 
the plasmids and chromosomes of bacteria and archaea 
and usually comprise bicistronic operons encoding two 
small genes; one for a toxic component and a second for 
a cognate antitoxin (1,2,5). Though originally identified in 
1983 as plasmid maintenance systems (6), TA systems 
have been attributed to many physiological roles, 
including formation of persister cells (7), stress resistance 
(8), protection from bacteriophages (phages) (9) and regu- 
lation of biofllm formation (10), among others (11). 

TA systems have been divided into three Types, depend- 
ing on the nature of the interacting toxin and antitoxin 
macromolecules (4). Within Type 1 systems, an RNA anti- 
toxin interacts with the toxin transcript and inhibits trans- 
lation of the toxic protein (12). The toxins and antitoxins 
of Type II systems interact as proteins (5). Both Type I 
and Type II systems were originally identified through 
their role in plasmid maintenance (5,12). The recently dis- 
covered Type III systems, first identified by an abihty to 
abort phage infections (9), rely upon the direct interaction 
of an RNA antitoxin with the toxin protein (13). 

Recent studies have identified an ever-increasing 
number of experimentally defined, or putative, Type I 
and Type II TA systems (5,12,14-16). Type I toxins are 
generally small proteins of <60 amino acids. Type I anti- 
toxins are usually encoded as an antisense RNA product, 
but they can be transcribed divergently from the toxin 
(12). By combining iterative psi-BLAST searches with 
Type I-specific parameters (such as the presence of 
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tandem copies of the full loci, the free-energy minima 
of the antitoxin and the presence of transmembrane 
domains), the authors were able to detect multiple copies 
of known Type I loci within new hosts across 774 bacterial 
genomes (12). The authors were also able to identify and 
experimentally validate novel Type I TA systems (12). In 
some cases, many Type I TA system families, and multiple 
members thereof, were identified in the same host; in the 
extreme case, Escherichia coli 0157:H7 str. Sakai con- 
tained 26 Type I loci (12). By comparing the phylogenetic 
tree of these identified Type I loci with the host taxonomy, 
it appeared that Type I loci have not been freely 
disseminated by horizontal gene transfer but may have a 
common ancient ancestor (12). 

Global identifications of Type II systems have often 
relied on identifying putative toxin and antitoxin genes 
by distant sequence similarity, followed by 'guilt-by- 
association', where the homologue pairs must cluster 
into a putative bicistron (5,16). Type II systems previously 
have been grouped according to the toxin genes present in 
the locus (2), though they could be re-classified at the level 
of toxin structure (4), which correlates well with distant 
sequence similarity (3,5). The most recent global bioinfor- 
matic search identified 12 toxin and 20 antitoxin 
super-famihes within 2181 prokaryotic genomes (5). This 
recent study also highlighted the mosaic nature of Type II 
systems; whereas, previously, specific toxin genes were 
thought to be associated with specific antitoxin genes, 
there is clear heterogeneity and interchangeabihty within 
Type II systems (5). Type II loci were observed in even 
greater numbers than Type 1 loci, with the maximum 
peaking at 97 Type II loci within a single genome (5). 
Unlike Type I systems, there appears to have been signifi- 
cant horizontal gene transfer of Type II systems (5). Type 
II systems are also highly abundant in mobile genetic 
elements, which may relate to their 'addictive' nature 
and ability to maintain these replicons (17). Web-based 
tools, such as the search engine RASTA (18) and the 
database TADB (19), have been developed to assist in 
identification and cataloguing of the Type II systems. 

The recently identified Type III TA loci were originally 
isolated and defined as abortive infection systems, protect- 
ing bacterial populations from bacteriophage (phage) 
assault (9,20,21). Within each Type III locus, a toxin 
gene is preceded by a short palindromic repeat, which is 
itself preceded by a tandem array of nucleotide repeats 
(Figure lA). The short palindromic repeat acts as a tran- 
scriptional terminator, regulating the relative levels of 
antitoxic RNA and toxin transcript (9,21). The first 
Type III TA system, ToxIN, was encoded on plasmid 
pECA1039 of the Gram-negative phytopathogen, 
Pectobacterium atrosepticum. This locus encodes a 
19.7-kDa toxic protein, ToxN, and upstream there is a 
repetitive array containing 5.5 tandem repeats of a 36 nt 
sequence, collectively known as the Toxl antitoxin. 
Through genetic studies, it was predicted that each 36 nt 
Toxl RNA repeat was able to inhibit the activity of ToxN 
(9). The crystal structure of the ToxIN complex revealed 
a heterohexameric triangular assembly of three ToxN 
proteins interspersed by three, 36 nt, Toxl RNA 
pseudoknots (4,13) (Figure IB and C). ToxN was 



demonstrated to be an endoribonuclease, related in struc- 
ture to the endoribonucleases Kid and MazF (13). 

Following characterization of ToxIN, simple BLAST 
(22) searches using the ToxN amino acid sequence- 
identified multiple homologues (9). These were found on 
the plasmids and chromosomes of both Gram-negative 
and Gram-positive species, within human and animal 
pathogens, oceanic and soil bacteria and extremophiles. 
Though these were identified by shared identity with 
ToxN, the cognate Toxl varied greatly in terms of the 
number of repeats, the length of repeats and the underlying 
sequence of repeats (9). These related ToxN proteins are 
therefore likely to bind to very different RNA sequences, 
thereby providing a whole new subset of complexes upon 
which to study protein-RNA interactions. Judging from 
the macromolecular structure of ToxIN and the offset 
nature of the Toxl pseudoknot sequence (13), the specific 
RNA sequence that will physically bind to each ToxN 
cannot be trivially deduced from the pattern of DNA 
repeats; it is more likely that each pseudoknot RNA is 
generated across the DNA repeat boundaries. 

Just as the number and diversity of identified Type I and 
Type II loci continues to increase (5,12,16), it follows that 
the current hst of Type III TA loci must be hugely 
under-representative. When the structure of ToxN was 
first obtained, a comparative search of entries within the 
PDB identified Type II toxin proteins Kid, MazF, CcdB 
and YdcE as high-scoring structural homologues of ToxN 
(4,13). The structures of members of this Kid/MazF/CcdB 
superfamily overlay well with the core regions of ToxN, 
though ToxN has specific additional folds that appear 
to allow binding of the antitoxic RNA pseudoknot 
(4,13). While these comparisons rehed upon known struc- 
tures, it was also possible to perform structure-based 
homology searches to identify novel Type III systems, 
de novo. Initial searches using FUGUE (23) generated a 
hst of 880 putative ToxN homologues. Due to the com- 
plexity of the Type III locus, an algorithm was not avail- 
able with which to sort these putative hits. Each hit was 
therefore assessed on a case-by-case basis, identifying 
those hits that match the criteria of a Type III TA locus. 
Having identified putative loci, iterative BLAST searches 
were performed to expand the hst of potential homo- 
logues. In this manner, 125 putative Type III TA loci 
were identified. These hits were further divided into 
three families, according to protein sequence homology. 
Phylogenetic analyses were performed to assess the distri- 
bution of these Type III loci in relation to the taxonomy of 
their hosts. For the first time, we were able to identify 
putative Type III loci in both an archaeal species and a 
bacteriophage. The findings were vahdated experimentally 
by taking exemplars of each family and testing for toxic 
and antitoxic effects within a reconstructed E. coli 
over-expression model. Each locus tested was confirmed 
as an active TA system. Furthermore, these loci were 
tested for their abihty to protect from infection by 
coliphages. One of the newly identified loci conferred 
phage resistance. This global analysis using structure- 
based homology models has greatly increased our know- 
ledge of the prevalence and diversity of Type III TA 
systems within sequenced prokaryotic genomes. 
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Figure 1. Overview of Type III toxin-antitoxin loci. (A) Schematic of a Type III toxin-antitoxin locus. The paradigmatic Type III system, ToxIN 
from P. atroseplicum, is depicted. The toxin gene, ToxN, is preceded by a stem-loop structure formed from a palindromic repeat. This is itself 
preceded by a set of tandem nucleotide repeats which act as antitoxic RNA pseudoknots. In the case of ToxIN, the antitoxin, Toxl, is encoded by 
5.5, 36 nt tandem repeats (yellow arrows for a full repeat, grey arrow for the half repeat). The locus is transcribed from a constitutive promoter, with 
the —35 and —10 elements shown as black boxes and the transcriptional start site shown by a black arrow. (B) ToxIN trimer (PDB: 2XDD), with 
Toxl monomers shown as cartoons and ToxN monomers shown with electrostatic surfaces, where red represents electronegative potential and blue is 
electropositive. Three monomers of ToxN are held at respective corners of a heterohexameric triangular assembly, formed entirely through protein- 
RNA interactions with the interspersed pseudoknots of 36 nt Toxl RNAs. (C) ToxIN trimer (PDB: 2XDD) with Toxl shown as yellow sticks and 
ToxN as cyan cartoons. Figure and legend adapted with the author's permission from Ref. (4). 



MATERIALS AND METHODS 

Bioinformatic approach 

Homology-based searches 

Based on the crystal structure of ToxIN (PDB: 2XDD), an 
environment-specific substitution profile was created by 
using JOY (24) and FUGUE (23), with additional hom- 
ologous sequences collected from the NCBI nr database. 
Sequences of bacterial and archaeal proteomes were 
downloaded from integrS (25) (http://www.ebi.ac.uk/ 
integrS/), using a custom script. The fugueprf programme 
in the FUGUE suite was used to perform structure-based 
homology searches against these sequences with the ToxN 
profile as a query. All the hits with Z-scores >3.5 were 
collected. FUGUE has been widely used for detecting 
remote homologues and previous benchmark results 
suggest that Z-scores > 6.0, > 4.0 and > 3.5 would corres- 
pond to confidence levels of 99, 95 and 90%, respectively. 

Manual searches 

The first 270 FUGUE hits, with Z-scores ranging from 
27.4 to 4.0, were analysed. For each protein sequence, 
the coding sequence and 1 kb up- and downstream were 
extracted from the NCBI database. This extracted locus 
sequence was then analysed using Tandem Repeat Finder 
(26); default settings were used for the initial searches 
(match, mismatch, indels = 2, 7, 7; min score = 50), 
followed by less stringent searches for those homologues 
with more variable antitoxin sequences (match, mismatch, 
indels = 2, 3, 5; min score = 50). If repeats were identified. 



the locus sequence was then examined for a palindromic 
repeat and an E. coli consensus promoter sequence. Toxin 
amino acid sequences from the resulting positive hits were 
then ahgned against each other using BLASTp (22), in 
order to group them by families of related sequences. 
Examples from each family were then used in iterative 
rounds of BLASTp searches until all homologues had 
been identified from the NCBI database (current as of 
July 2011). Information about each identified hit, 
including sequence information, is stored within a search- 
able spreadsheet, as Supplementary Table SI. Full details 
of the 880 FUGUE hits, their analysis and the subsequent 
BLAST searches are stored within a second spreadsheet, 
as Supplementary Table S2. 

Phylogenetics 

Phylogenetic analysis was performed on 69 toxin se- 
quences which were part of putative Type III TA loci un- 
ambiguously containing aU the required sequence 
elements, (as outhned in the 'Results' section), together 
with sequences of the Type II toxins Kid, MazF, YdcE, 
CcdB and RelE. The alignment of these 74 protein se- 
quences was performed using Clustal Omega (27). 
Additionally, 16S rDNA sequences were retrieved from 
the Ribosomal Database Project for the strains encoding 
the corresponding putative Type III TA loci (28). 
Maximum Likehhood phylogenetic analysis was per- 
formed using TREEFINDER (http://www.treefinder.de) 
(29). TREEFINDER was first used to select an appropri- 
ate model to analyse the aligned datasets. Following the 
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Akaike information criterion, the VT substitution model 
was chosen for the ToxN sequences, while the GTR model 
was used for the 16S sequences. Phylogenies were recon- 
structed using the default settings of TREEFINDER. 
LR-ELW (Local Rearrangement-Expected-Likehhood 
Weights) edge support (using 1000 replicates) was 
enabled to provide approximate bootstrap analysis. 

Alignment of Toxl 

Single repeat consensus sequences of Toxl homologues 
were manually aligned with Toxl from P. atrosepticum, 
according to sequence and on the length and spacing of 
potential base-pairing regions. Nested base-pairing 
regions in each Toxl sequence were identified using 
pknotsRG (30). The start and end of each predicted 
Toxl pseudoknot was assigned from the relative position 
of structural features and is not anchored to the Toxl 
DNA repeat. 

Bacterial strains, bacteriophages and media 
Bacterial strains 

All functional assays were performed within E. coli strain 
DH5a (Gibco/BRL). Genomic DNA was extracted from 
10 ml overnight cultures of Photorhcihdus luminescens 
subsp. laumondii TTOl (31). For further details about 
each strain, see Supplementary Table S3. 

Bacteriophages 

All bacteriophages used were isolated from treated sewage 
effluent, collected from a river Cam outlet at Milton, near 
Cambridge, UK (Supplementary Table S3). 

Media 

E. coli strains were grown at 37°C and P. luminescens was 
grown at 30°C, in Luria broth (LB) at 250 rpm or on 
LB-agar (LBA) containing 1.5% w v~' or 0.35% w v~' 
agar, to make LBA plates or top-LBA, respectively. 
Growth (ODgoo) was measured using a Helios a spectro- 
photometer set to 600 nm. When required, media was sup- 
plemented with ampicillin (Ap) at 10|.igml~', 
spectinoniycin (Sp) at 50|igml~', D-glucose (glu) at 
0.2% w v~', L-arabinose (L-ara) at 0.1% w v~' and 
isopropyl-p-D-tliiogalactopyranoside (IPTG) at 1 mM. 
Bacteriophages were stored over chloroform in phage 
buffer; 10 mM Tris-HCl, pH 7.4, 10 mM MgS04 and 
0.01% w v"' gelatin. 

Cloning of Type III toxin-antitoxin loci 

Molecular biology techniques were performed as 
described previously (32). AU primers were obtained 
from Sigma-Genosys and are listed in Supplementary 
Table S4. All plasmids constructed and/or used in this 
study are listed in Supplementary Table S5. The molecular 
nature of each recombinant plasmid was verified by 
DNA sequencing. Genomic DNA was obtained from 
P. luminescens using an extraction kit (Qiagen). The 
genomic DNA from other strains was kindly provided 
to us from other researchers (see 'Acknowledgements' 
section). 



Toxin cloning 

Each toxin sequence was amplified by PCR using genomic 
DNA as a template and cloned into pBAD30 (33), using 
the designated primers and restriction enzymes 
(Supplementary Tables S4 and S5). Each toxin was 
cloned such that the protein was translated using the 
native ribosome binding site. Transformants of 
pBAD30-based vectors were selected on LBA supple- 
mented with ampicilhn and glucose. 

Antitoxin cloning 

An expression vector carrying a single repeat of the 
P. luminescens tenpl antitoxin was generated by first amp- 
hfying the required insert using primers TRB214 and 
PF185, with pQE-80L as a template. The resulting 
amplicon was then cloned into pTAlOO, using PstI and 
Xhol. For all other antitoxins, the sequence was amplified 
by PCR using genomic DNA as a template and cloned 
into pTAlOO (9), using the designated primers and restric- 
tion enzymes (Supplementary Tables S4 and S5). 

Type III toxin-antitoxin locus cloning 

Each Type 111 toxin-antitoxin locus was cloned into 
pBR322 (34), using amphcons generated by PCR amplifi- 
cation from genomic DNA, with the designated primers 
and restriction enzymes (Supplementary Tables S4 and 
S5). The region cloned for each locus includes up to 
500 bp upstream of the toxin start codon and < 100 bp 
downstream of the toxin stop codon. 

Bacteriophage isolation 

A 10 ml sample of treated sewage effluent was shaken vig- 
orously with 500 ^1 of chloroform for 1 min. A 200-|rl 
aliquot of this treated sample was mixed with 200 |rl of a 
DH5a overnight culture and 3 ml of top-LBA, and then 
poured as an overlay on an LBA plate. These plates were 
incubated overnight at 37°C and the resulting phage 
plaques were picked with sterile toothpicks into 50 [il of 
phage buffer, which was then treated with 20 |rl of 
chloroform. 

Protection assays 

Strains of E. coli, containing both antitoxin and toxin 
(or control) expression plasmids, were obtained by either 
sequential or co-transformation. Single colonies of the re- 
sulting strains were grown as 10 ml overnight cultures and 
these were used to inoculate 25 ml of LB, Ap and glu in 
250 ml flasks, then grown at 37°C and 250 rpm in an 
orbital shaker, from a starting ODgoo of ~0.04, until 
exponential phase [~1 x 10** colony forming units (cfu) 
ml~']. At this end point, samples were removed, washed 
with phosphate buffered saline, serially diluted and plated 
for viable counts at 37°C on LBA, Ap, Sp plates contain- 
ing either (i) glu, so neither toxin or antitoxin is expressed; 
(ii) glu and IPTG, to express the antitoxin; (iii) L-ara, to 
express the toxin; or (iv) L-ara and IPTG to express both 
the toxin and antitoxin. The data presented are the mean 
viable counts from triplicate data, with error bars repre- 
senting the standard deviation. 
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Efficiency of Plating assays 

Isolated phage plaques were repeatedly re-plated until 
plaque pure and homogeneous plaque morphologies 
were reproducibly obtained. Lysates of the phage were 
then made by scraping the top-LBA from a plate with a 
confluent lawn of phage plaques into a glass universal, 
adding 3 ml of phage buffer and vortexing vigorously 
with 500 |il of chloroform for 1 min. After standing at 
room temperature for 30 min, the agar mix was 
centrifuged at 2200^ for 20 min at 4°C. The supernatant 
was decanted into a glass bijou for phage lysate storage at 
4°C, over 100 (il chloroform. Phage lysates were then 
titrated on the E. coli pBR322 control strain, and test 
strains. The resulting Efficiency of Plating (EOP) was 
calculated as (number of plaque-forming units on test 
strain/number of plaque-forming units on control 
strain). Each EOP value was calculated as the mean 
value from triplicate data. 

RESULTS 

Structure-based homology searches and identification 
of Type III toxin-antitoxin loci 

To date, there has been only one Type III TA locus 
characterized in detail; the ToxIN system, isolated from 
plasmid pECA1039 of P. atrosepticum SCRI1039 (9). This 
publication also reported, as proof-of-principle, that a 
homologous ToxIN system from Bacillus thuringiensis 
also acted as a Type III TA system in E. coli (9). A cata- 
logue of 19 other homologues of ToxN was identified 
from the NCBI database using BLASTp searches (9,22). 
The current study has attempted to significantly extend 
this minimal list of Type III TA systems. 

The structure of ToxIN was solved by crystallographic 
analysis (13), and so it was possible to use the program 
FUGUE (23) to perform structure-based homology 
searches with ToxN as the search model. FUGUE 
searches were performed in June 2010, against 1852 
bacterial and archaeal proteomes taken from integrS (25) 
(http://www.ebi.ac.uk/integr8/). The total number of 
sequences scanned was >6 million. This generated an initial 
list of 880 putative ToxN homologues, with Z-scores 
ranging from 27.4 to 3.5 (Supplementary Table S2). 

To identify the subset of putative ToxN homologues 
which are hkely to come from true Type III TA loci, a 
hst of criteria was set, based on the required features of a 
Type 111 TA locus as determined experimentally (9,13,21). 
Namely, the putative toxN homologue should be preceded 
by a short palindromic repeat to act as a transcriptional 
terminator, which should, in turn, be preceded by a 
tandem array of nucleotide repeats to act as the antitoxic 
toxl. Preferentially, this locus should then be preceded by 
—35 and —10 promoter elements, with the position defined 
by the start of toxl. As it became more difficult to predict 
a promoter in bacteria more distantly related from E. coli, 
the presence of an obvious promoter was excluded as a 
strict requirement. No algorithm was immediately avail- 
able to allow rapid processing of the FUGUE hits, so the 
analysis was undertaken on a case-by-case basis. Each 



nucleotide sequence of the identified ToxN homologues 
was taken together with 500 bp upstream and down- 
stream, and attempts were made to identify the cognate 
toxl using Tandem Repeat Finder (26). The default 
settings of Tandem Repeat Finder proved adequate to 
accurately predict the Toxl for each of the previously 
identified homologues (9), so these settings were used for 
all initial searches. Where a suitable toxI-\ike sequence 
was identified, the sequence was then screened for a 
putative transcriptional terminator and promoter by 
visual inspection. If a /ox/-like sequence was not 
identified, the sequence was not examined any further 
and these negative hits are recorded within Supplementary 
Table S2. In this manner, we examined the first 270 
FUGUE hits. At this point, the Z-score had reached 4.0 
and was slowly decreasing towards 3.5 (Supplementary 
Figure SI). There were rapidly diminishing returns in 
identifying Type 111 loci from the FUGUE hits as we pro- 
gressed to Z-scores <5.0 (Supplementary Figure SI). 
Analysis of the raw FUGUE hits was therefore ended 
after the first 270 hits. At this point, 37 putative Type 
III loci were identified. 

Three families of Type III toxin-antitoxin loci were 
identified 

A BLASTp matrix was formed to compare the amino acid 
sequences of the 37 ToxN structural homologues with 
every other sequence; it became immediately clear from 
the returned E values that the 37 hits could be divided 
into three famihes (Supplementary Figure S2). Though 
all the hits by definition share the same overall tertiary 
structure, members of the second and third family toxins 
had low detectable amino acid sequence identity shared 
with either ToxN or between each other (Supplementary 
Figure S2), which supports the decision to classify them 
into three independent famihes. The first family contained 
ToxIN from P. atrosepticum, B. thuringiensis and all 
homologues thereof. When naming the families, it was 
decided to maintain the TN' nomenclature, wherein 
every antitoxin has the suffix T' for inhibitor and each 
toxin is denoted 'N', as for the toxIN family. This 
standardization, if used universally, would ensure that 
when Type III TA systems are identified in future, 
the reader should be able to conclude readily which 
component is the antitoxin and which component is the 
toxin. The second family contained a locus from 
Coprococcus catus GD/7, so the family was named cptIN 
{CoPrococcus Type III Inhibitor/toxiN; suggested pronun- 
ciation, 'cap-tin'). The third family contained a locus from 
P. luminescens subsp. laumondii TTOl, so this family was 
named tenpIN (Type III ENdogenous to Pliotorliabdus 
Inhibitor/toxiN). 

Having divided the 37 hits into three families, we took 
examples from each family and performed exhaustive 
BLASTp searches. The results from these searches were 
again assessed for hits that represented putative Type 111 
TA systems, following the same criteria as described pre- 
viously for the FUGUE hits (Supplementary Table S2). 
Negative hits were then re-visited using less stringent 
settings of tandem repeat finder to identify Type III loci 
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with greater variability between their antitoxin repeat 
sequences. A final list of 125 putative Type III TA 
systems was generated. A consolidated version of this 
hst is presented in Table 1 . A more comprehensive, search- 
able, spreadsheet of these hits, including all the nucleotide 
and protein sequences, is available as Supplementary 
Table SI. 

Dividing the 125 hits back into families, we had 67 
examples of toxIN loci, 33 of cptIN and 25 of tenpIN. 
We found, for the first time, that multiple copies of the 
same family can exist within one host, such as within 
Eubacterium rectale ATCC 33656, Fusobacterium sp. 
3_1_5R, Lactobacillus jensenii 208-1 and Peptoniphilus 
duerdenii ATCC BAA- 1640 (Table 1). Furthermore, 
multiple families may be represented within a single 
host, such as is the case for Bryantella formatexigens 
DSM 14469, two Clostridia, E. rectale DSM 17629, three 
fusobacteria, Haemophilus sonmus 2336, Leptotrichia 
hofstadii F0254 and Leptotrichia goodfellowii F0264. 
Leptotrichia goodfellowii F0264 contained the highest 
number of Type III TA loci, six in total, with at least 
one representative from each of the three famihes 
(Table 1). 

The majority of hits were found on bacterial chromo- 
somes and plasmids, though this analysis identified a 
toxIN locus encoded on a plasmid prophage; Clostridium 
phage D-783. This converting phage carries the neuro- 
toxin cluster encoding a major pathogenicity determinant 
of Clostridium botulimim. Type I and Type II loci also 
have been identified on phages, such as the Hok/Gef 
Type I system of the enterobacterial phage 933 W (12) 
and the Type II phdjdoc locus of PI (35). 

Each of the hits contained in Table 1 is unique, in that 
the individual locus is contained within a unique host. 
However, there are cases where exact copies of the same 
Type III TA locus have been detected in multiple host 
strains. This was seen with the toxIN locus of 
Ruminococcus torques ATCC 27756 and Lachnospiraceae 
bacterium 8_1_57FAA. Similarly, Lactobacillus jensenii 
208-1 contains two toxIN loci; one is found duplicated 
in Lactobacillus jensenii 27-2-CHN, while the other is 
duplicated in both Lactobacillus jensenii 269-3 and 
Lactobacillus jet^senii 1153. The tenpIN locus identified 
within the two Vibrio species, 0593 and MZO-3, is also 
identical except for one, silent, base substitution in the 
toxin-coding sequence. 

The criteria for identification of each putative Type III 
TA locus required a toxin gene to be preceded by a ter- 
minator and, further upstream, a set of tandem repeats. 
While the majority of hits have a relatively small distance 
between the repeats, the terminator and the toxin gene 
(around 1-50 bp between each component), there were 
some examples with larger intervening distances. The 
toxIN locus of Actinobacillus ureae ATCC 25976 has a 
gap of 340 bp between toxl and toxN. The toxIN loci of 
Roseburia intestinalis M50/1 and Roseburia intestincdis 
XB6B4 have similar such gaps, of 232 and 292 bp, respect- 
ively. There were no ORFs or other features of note 
detected within these gaps. 

Whereas many Type II TA loci have been identified in 
Archaea (5), previous studies have not identified any 



examples of either Type I or Type III TA loci in this 
third superkingdom (9,12). Our new analysis has identified 
one putative cptIN locus within the Archaeon, 
Methanococcus vannielii SB. This system also has a 
larger than expected gap of 242 bp separating the antitoxic 
repeats and toxin gene. These hits were retained within the 
consolidated hst of putative systems (Table 1 and 
Supplementary Table SI), as items of interest for further 
study and vahdation. 

The toxIN locus of Lactobacillus jensenii SJ-7A-US 
follows the Type II liigBA TA locus (36), in that the ca- 
nonical arrangement of toxin to antitoxin genes is 
swapped. It was observed that the antitoxic toxl repeats 
and terminator appear to be downstream of the toxN 
gene. Though there is a predicted promoter down- 
stream of this toxN homologue potentially driving tran- 
scription of the cognate toxl, using the current model, it is 
unclear how this locus might successfully regulate the 
levels of the two species. This wiU be a focus of future 
study. 

There are several hits within our analysis that have 
toxin sequences representing partial sequences of other, 
longer, toxins. These smaller toxins have full sets of anti- 
toxic repeats and terminators, though only shorter 
versions of the toxin genes. It is unclear whether these 
partial proteins are stiU toxic or possess a different 
cellular activity. Two of the three cptIN loci of 
Bryantella formatexigens DSM 14469 encode toxins of 
66 and 45 amino acids, respectively, and the singular 
cptIN locus of the Lachnospiraceae bacterium 
9_1_43BFAA (54 amino acids), Methanococcus vannielii 
SB (101 amino acids) and Pyramidobacter piscolens 
W5455 (81 amino acids) also encode shorter toxins, all 
of which share sequence similarity with the 162 amino 
acids EUBREC_0659 protein from E. rectale ATCC 
33656. The toxin from the singular tenpIN locus of 
Leptotrichia hofstadii F0254 (70 amino acids) is a 
shortened version of the TenpN toxin from P. luminescens 
(143 amino acids). These short proteins could represent 
non-toxic forms of the longer homologues, or, they 
could have also arisen through sequencing errors within 
draft genomes that would result in premature termination 
of the predicted proteins. 



Analysis of the three Type III toxin-antitoxin families 

Selected features of the sequences within each of the three 
families of Type III TA loci are summarized in Table 2. 
The summary shows that, although the two original toxIN 
loci (from P. atrosepticum and B. thuringiensis), were 
found on plasmids (9), the majority of the Type III TA 
loci are encoded in the host chromosome. This may be a 
reflection of sequencing projects focusing on chromo- 
somes rather than extrachromosomal genomes. In the 
case of cptIN, however, there were no examples on any 
other replicon but the chromosome. 

Comparing antitoxin sequences of the toxIN family 
with those of cptIN and tenpIN, it seems there is a 
general shift to progressively fewer copies, but longer 
lengths, of antitoxic repeats. The toxin size, however, 



676^ Nucleic Acids Research, 2012, Vol. 40, No. 13 



Table 1. Distribution of three identified Type III toxin-antitoxin families 



Strain 



Plasmid 



Abbreviation 



Taxonomy (Piiylum, Class, 
Order) 



toxIN loci cptIN loci tenpIN loci 



Ahiotrophia defectiva ATCC 49176 
Acetivihrio cellulolyticu.s CD2 

Acidohacteriiim sp. MP5ACTX9 pACIX905 

ActinohaciUus ureae ATCC 25976 

AlkaliphiJus oremlandii OhILAs 

Anoxyhacillus flavithermus WKl 
Bacillus cereiis Rock 1-1 5 

Bacilhi.i thiiringiensis serovar pBT9727 

konkukian str. 97-27 
Bacillus thuringiensis serovar kurstaki pAW63 

pAW63 
Bacillus thuringiensis serovar 

pondicheriensis BGSC 4BA1 
Bacillus weihenslephanensis KBAB4 pBWB402 
Bryantella formatexigens DSM 14469 

Caldicellulosiruptor hescii DSM 6725 

Caldicellulosiruptor krisljanssonii 
177R1B 

Caldicellulosiruptor lactoaceticus 6A 

Clostridium hotulinuin BKT015925 plBKT015925 

Clostridium cellutovorans 743B 

Clostridium hiranonis DSM 13275 

Clostridium nexile DSM 1787 

Clostridium phage D-1873 

Clostridium sp. HGF2 

Coprohacillus sp. 29_1 

Coprococcus catus GD/7 

Coprococcus sp. ART55/1 

Euhacterium rectate ATCC 33656 

Euhacterium rectate DSM 17629 

Euhacterium rectale M104/1 

Euhacterium sahurreum DSM 3986 

Euhacterium saplienum ATCC 49989 

Euhacterium ventriosum ATCC 27560 

Euhacterium yurii subsp. margareliae 

ATCC 43715 
Fihrohacter succinogenes subsp. 

succinogenes S85 
Finegoldia magna BVS033A4 

Fusohacterium nucleatuni subsp. 

nucleatum ATCC 23726 
Fusohacterium nucleatum subsp. 

polymorphum ATCC 10953 



Ade 



Aor 

Afl 
Bee 



Bthsk 

Bthsp 

Bwe 
Bfo 

Cbe 

Ckr 

Cbo 
Cce 
Chi 

Clo<DD 
Clh 

Cca 

Ere33656 
Ere 17629 

Esab 

Esap 

Eve 

Eyu 

Fsu 

Fma 

Fnun 

Fnup 



Firmicutes, Bacilh, Lactobacillales 
Firmicutes, Clostridia, 

Clostridiales 
Acidobacteria, Acidobacteria, 

Acidobacteriales 
Proteobacteria, 1 

Gammaproteobacteria, 

Pasteurellales 
Firmicutes, Clostridia, 1 

Clostridiales 
Firmicutes, Bacilli, Bacillales 
Firmicutes, Bacilli, Bacillales 1 
Firmicutes, Bacilli, Bacillales 

Firmicutes, Bacilli, Bacillales 1 

Firmicutes, Bacilli, Bacillales 

Firmicutes, Bacilli, Bacillales 1 
Firmicutes, Clostridia, 2 

Clostridiales 
Firmicutes, Clostridia, 

Thermoanaerobacterales 
Firmicutes, Clostridia, 

Thermoanaerobacterales 
Firmicutes, Clostridia, 

Thermoanaerobacterales 
Firmicutes, Clostridia, 1 

Clostridiales 
Firmicutes, Clostridia, 1 

Clostridiales 
Firmicutes, Clostridia, 

Clostridiales 
Firmicutes, Clostridia, 1 

Clostridiales 
Firmicutes, Clostridia, 1 

Clostridiales 
Firmicutes, Clostridia, 1 

Clostridiales 
Firmicutes, Erysipelotrichi, 1 

Erysipelotrichales 
Firmicutes, Clostridia, 

Clostridiales 
Firmicutes, Clostridia, 1 

Clostridiales 
Firmicutes, Clostridia, 

Clostridiales 
Firmicutes, Clostridia, 1 

Clostridiales 
Firmicutes, Clostridia, 1 

Clostridiales 
Firmicutes, Clostridia, 1 

Clostridiales 
Firmicutes, Clostridia, 1 

Clostridiales 
Firmicutes, Clostridia, 

Clostridiales 
Firmicutes, Clostridia, 1 

Clostridiales 
Fibrobacteres, Fibrobacteres, 

Fibrobacterales 
Firmicutes, Clostridia, 1 

Clostridiales 
Fusobacteria, Fusobacteria, 

Fusobacteriales 
Fusobacteria, Fusobacteria, 1 

Fusobacteriales 



(continued) 
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Table 1. Continued 
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VlnCenill ^VZJU 






Fusobacteriales 






FusohcicteviuTii periodonticuni ATCC 




rpe 


Fusobacteria, Fusobacteria, 




1 
i 








Fusobacteriales 






Fusohcicteviwii sp. 2 131 






Fusobacteria, Fusobacteria, 


1 

1 










Fusobacteriales 






rusobcictcriuwi sp. i 1 ii 






Fusobacteria, Fusobacteria, 


1 

i 


1 
i 








Fusobacteriales 






r usoDcictci'iuni sp . 3 1 J 6 A2 






Fusobacteria, Fusobacteria, 




1 
i 








Fusobacteriales 






■ T 1 CD 

usoDcic tci'lufii sp . 3 1 5 K. 




rUS 


Fusobacteria, Fusobacteria, 


J 










Fusobacteriales 






tusoDGCtci'iuni sp. 4 1 13 




r us4 


Fusobacteria, Fusobacteria, 


1 


1 
1 








Fusobacteriales 






riisoDcicfci'ium sp. / 1 






Fusobacteria, Fusobacteria, 


1 
i 










Fusobacteriales 
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r USUI z 


Fusobacteria, Fusobacteria, 
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I 


Gcohdcillus t hcrmo^Incosiilcisius 






ni llllLU LCs, OdLllll, JjdLilldlCi 


J 




JO- I oVj 












f-t /t/^wi/infj I 111 f I liti n/?ii " /~i/^ r\\r\t\/v\f^ 
11 LldtlUUltllLiLi Ill/ILICH^LII: UlOLy Ut 


pF1947 


Hin 


JT lU LCiJ UdV^lCl Id, 












f^iamTTi n rM*r\("pn ntiptpft a 
VJdlllllldlJi UltU UdX-lCi Id, 
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Proteobacteria, 
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Proteobacteria, 












Gammaproteobacteria, 












Paste urellales 






TiClSf770phllllS pQi'CiSUlS orlUluJ 




npau 1 D J 


Proteobacteria, 












Gammaproteobacteria, 












Paste urellales 






TiGcinopfiilus sofnnus 2336 




Wen 

nso 


Proteobacteria, 




1 
i 








Gammaproteobacteria, 












Paste urellales 






hcicliiiospii'ciceac bacterium 






Firmicutes, Clostridia, 






7 1 A A 






Clostridiales 






Lcichnospircicecic bacterium 




T l^Q 


Firmicutes, Clostridia, 






"X 1 4APA A 






Clostridiales 






Lcichnospircicecic bacterium 






Firmicutes, Clostridia, 






A 1 "ivp A A 






Clostridiales 






LfCichiiospii'cicccic bacterium 






Firmicutes, Clostridia, 


1 
1 




c 1 fi'^F A A 






Clostridiales 






Lcichnospircicecic bacterium 






Firmicutes, Clostridia, 


1 
1 




o 1 C7i:;a a 






Clostridiales 






Lcichnospircicecic bacterium 






Firmicutes, Clostridia, 


1 
1 




Q 1 A^RFA a 






Clostridiales 






Lcichnospircicecic oral taxoti 107 str. 






Firmicutes, Clostridia, 


1 
1 




r U 10 / 






Clostridiales 






Lcictohcicillus gcisseri JV-V03 




Lga 


Firmicutes, Bacilli, Lactobacillales 






i^cicLUuLitiiius itciveiicits i_-/oivi zuu/j 






Firmicutes, Bacilli, Lactobacillales 






l^tlL HJiJUL lllttci llClVcilLLiA iVl 1 Jt^UJ 






Fi rm ipi 1 tpc Rnpilli T iiptnniiPilliilpc 

1 11 llilV^LI UdCllll, L^dL. L>JUd\_llldlt^d 






LGctohcicillus jcnsenii 1153 




J_JC 1 1 J J 


Firmicutes, Bacilli, Lactobacillales 






Lactohacilhis jensenii 208-1 




Lje208 


Firmicutes, Bacilli, Lactobacillales 






Lactobacillus jensenii 269-3 




Lje269 


Firmicutes, Bacilli, Lactobacillales 






Lactohacilhis jensenii 27-2-CHN 




LjeCHN 


Firmicutes, Bacilli, Lactobacillales 






Lactohacilhis jensenii SJ-7A-US 




LjeUS 


Firmicutes, Bacilli, Lactobacillales 






Lactohacilhis kefir anofaci ens ZW3 




Lke 


Firmicutes, Bacilli, Lactobacillales 






Lactococcus lactis subsp. lactis CV56 


pCV56A 


Lla 


Firmicutes, Bacilli, Lactobacillales 






Lactococcus lactis W-37, protein 


pSRQ900 


LlaW37 


Firmicutes, Bacilli, Lactobacillales 






'AbiQ' 












Leptotrichia goodfellowii F0264 




Lgo 


Fusobacteria, Fusobacteria, 


4 1 


1 








Fusobacteriales 







(continued) 
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Table 1. Continued 



Strjiin 


Plcisnnd 


Abbrcvieition 


Taxonomy (Phylum, Class, 
Order) 


toxlN loci cplIN loci 


tenpIN loci 


i^cptotvicriici llOJStClull ruzj4 






r usobactcna, ruso bacteria, 
Fusobacteriales 


z 


1 
1 


Mahella australiensis 50-1 BON 




Mau 


Firmicutes, Clostridia, 
Thermoanaerobacterales 


1 




Methanococciis vannielii SB 






Euryarchaeota, Methanococci, 
Methanoeoccales 


1 




rcctobiictci'iiwt cilt'oscpticwn 




r Dd 


Proteobacteria, 


1 




SCRI1039 






Gammaproteobacteria, 
Enterobacteriales 






Peptoniphilus diierdenii ATCC 




Pdu 


Firmicutes, Clostridia, 


2 










^~'l^\ctT■l^^1 n li=*c 






Peptoniphilus harei ACS-146-V-Sch2b 






Firmicutes, Clostridia, 
Clostridiales 


1 




Phascokirctohacterium sp. YIT 12067 




Pha 


Firmicutes, Negativicutes, 
Selenomonadales 


1 




Photorhahdus luminescens subsp. 




Plu 


Proteobacteria, 




1 


Icniniondii TTOl 






Gammaproteobacteria, 
Enterobacteriales 






Pyvamidobcicter piscolens W5455 






Synergistetes, Syiiergistia, 
Synergistales 


1 
1 




Roschwici intcstiucilis M50/1 




R in 
Jxlll 


Firmicutes, Clostridia, 
Clostridiales 


1 
± 




Roschurio in t cstincilis XB6B4 






Firmicutes, Clostridia, 
Clostridiales 


1 
i 




Ruminococcus lac f oris ATCC 29176 




Rla 


Firmicutes, Clostridia, 
Clostridiales 


1 




Ruminococcus sp. j 1 jyo rAA 






Firmicutes, Clostridia, 


1 
± 








Clostridiales 






Ruminococcus t orcjiics J\ 2. / / jo 




Rto 


r 11 lillL, UlCs, V_-lUb 11 ILlld., 

Clostridiales 






Ruminococcus tovcjiics Lz-14 




JxLO 1"^ 


Firmicutes, Clostridia, 
Clostridiales 


1 
1 




Shewanella putrefaciens 200 






Proteobacteria, 

Gammaproteobacteria, 
Alteromonadales 




1 


Staphylococcus aureus HUNSC491 






Firmicutes, Bacilli, Bacillales 






Staphylococcus aureus A9754 




Sau 


Firmicutes, Bacilli, Bacillales 




1 


kj ILIU/ 1] lUL UL L Ul^ LlLll CU.^ jLlUalJ. LtUI CLii.^ 


pVRSA 




Pirmif^i itpc T-liir"!!!! Riif^illiilpc 
± 11 lllic LXICl), UdUllll, Ud^llldlt^L) 




\ 


Mu50 












Streptococcus sanguinis SK.72 




osa 


Firmicutes, Bacilli, LactobaciUales 






Taylorella equigenitalis MCE9 




Teq 


Proteobacteria, 

Betaproteobacteria, 
Burkholderiales 


1 




Thernwsinus carhoxydivorans Norl 




Tea 


Firmicutes, Negativicutes, 
Selenomonadales 


1 
1 




Treponema succinifaciens DSM 2489 




TSLl 


Spirochaetes, Spirocahetes, 
Spirochaetales 


1 




Treponema vincentii ATCC 35580 






Spirochaetes, Spirocahetes, 
Spirochaetales 


1 




V euloneiia atypica al^ j-i J4- v-L^oi/a 




vai 


Firmicutes, Negativicutes, 
Selenomonadales 


1 
1 




Veillondla parvula ACS-068-V-Schl2 




Vpa 


Firmicutes, Negativicutes, 
Selenomonadales 


1 




Vibrio cholerae MZO-3 






Proteobacteria, 

Gammaproteobacteria, 
Vibrionales 




1 


Vibrio cholerae 0395 




Vch 


Proteobacteria, 

Gammaproteobacteria, 
Vibrionales 




1 


Xenorhahdus bovienii SS-2004 




Xbo 


Proteobacteria, 

Gammaproteobacteria, 
Enterobacteriales 




1 


Yersi?iia pseudotuberculosis IP 31758 


pl53kb 


Yps 


Proteobacteria, 

Gammaproteobacteria, 
Enterobacteriales 




1 
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Table 2. Overview of Type III toxin-antitoxin families 



Family 


Total members 


Replicon 




Antitoxin repeat 


Number of tandem 


Toxin length 


name 


of family 






length (nucleotides) 


antitoxic repeats 


(amino 


acids) 






Chromosome Plasmid 


Phage 


Mean 


Range 


Mean 


Range 


Mean 


Range 


toxIN 


67 


57 9 


1 


38.1 


31-62 


2.8 


1.9-5.6 


168.0 


70-288 


cptIN 


33 


33 0 


0 


44.8 


40-48 


2.4 


1.9-3.4 


148.6 


45-213 


tenpIN 


25 


20 5 


0 


50.9 


39-57 


2.2 


1.9-3.0 


164.2 


70-259 



toxiN 



cptIN 



tenpIN 





Phyla 

■ Firmicutes 

■ Fusobacteria 

■ Proteobacteria 

■ Spirochaetes 
Synergistetes 

■ Euryarchaeota 

■ Acidobacteria 

■ Fibrobacteres 



Orders 

■ Bacillales 

■ Lactobacillales 

■ Clostridiales 

■ Erysipelotrichiales 

■ Selenomonadales 

■ Thermoanaerobacterales 

■ Fusobacteriales 
1^ Burkholderiales 



Enterobacteriales 

Pasteurellales 

Alteromonadales 

Vibrionales 

Spirochaetales 

Synergistales 

Methanococcales 

Acidobacteriales 

Fibrobacterales 



Figure 2. Taxonomy of Type III toxin-antitoxin loci. The taxonomic 
distribution of the identified members from each Type III toxin-anti- 
toxin family is shown as a pie chart, with the outer ring representing 
the different Phyla and the inner portion representing the respective 
subdivisions of each Phylum into Orders. Class has been omitted for 
clarity. Colours are as indicated in the inset key. 



Stays approximately the same. The smaller value for mean 
toxin size in cptIN is skewed by the presence of multiple 
small putative toxins, as discussed above. 

Phylogeny of ToxN proteins 

A recent global analysis of Type I TA systems indicated 
that there was little horizontal gene transfer contributing 
to dissemination of these loci (12). In contrast, Type 11 loci 
have been more freely distributed among evolutionarily 
unrelated species (5). Table 1 Usts the taxonomy of each 
host organism. This distribution can also be viewed in 
Figure 2. It appears that the vast majority of Type 111 
TA loci are found within either the Firmicutes, (mainly 
Orders Bacillales, Lactobacillales and Clostridiales), or the 
fusobacteria (Figure 2). In the case of the toxIN and 
tenpIN famihes, a substantial proportion of these loci 
are found in the Proteobacteria, while none of the cptIN 
loci were found in this Phylum. Though the original to.xIN 
locus was found in the enterobacterium, Pectobacterium, 
the other toxIN loci within the Proteobacteria were 
found in either Pasteurellales or (as for one example) 
within Burkholderiales. In contrast, the tenpIN loci 
identified within the Proteobacteria included examples 
from several enteric bacteria, such as Photorhabdus, 



Xenorhabdus and Yersinia, along with loci in the 
Pasteurellales, Alteromonadales and Vibrionales. 

To assess the impact of horizontal gene transfer on the 
spread of Type III TA loci, 69 of the 125 toxin sequences 
were ahgned and then analysed to construct a phylogen- 
etic tree using Maximum Likehhood (Figure 3). Previous 
results identified the Type II toxins MazF (E. coli). Kid 
(E. coli plasmid Rl), YdcE (Bacillus suhtilis) and CcdB 
{E. coli F plasmid) as structural homologues of ToxN 
(13). These were therefore also included in the analysis, 
as well as RelE (E.coli), the principal member of a Type II 
toxin family that has not been identified as similar in 
structure to ToxN or Kid/MazF. The resulting dendro- 
gram shows clear separation between toxins of the three 
predicted Type III TA famihes (Figure 3). Furthermore, it 
shows that the Type 111 toxins CptN and TenpN share a 
coiTiiTion route of divergence away from ToxN (Figure 3). 
The Type II endoribonucleases Kid, MazF and YdcE 
formed a clade independent of the three Type III 
families. It is interesting to note, however, that within 
this analysis CcdB, a topoisomerase inhibitor, and RelE 
were grouped with the ToxN family of proteins, albeit 
with large evolutionary distances. 

When a second phylogenetic tree was constructed using 
the 16S rRNA sequences from 44 of the host bacteria, we 
saw that these were now not tightly grouped, suggesting 
horizontal movement of similar Type III TA loci between 
unrelated bacteria (Supplementary Figure S3). 



Alignment of Toxl sequences 

The Toxl antitoxin of P. atrosepticum folds as a compact, 
hairpin-type pseudoknot with two single-stranded tails, 
which binds and inhibits two ToxN monomers at 
distinct surfaces, such that three molecules of the protein 
are held in a self-closing, inactive complex by three 
pseudoknots of Toxl (13). The pseudoknot core comprises 
two base-paired stems, stabilized by three base triplexes 
and interdigitation of a guanine between bases of the 
opposite strand (Figure 4A). To determine whether this 
structure is conserved within the new ?0A'/7V-family anti- 
toxins, which mostly show only limited sequence similar- 
ity, attempts were made to align the new Toxl sequences 
to the structural template of Toxl from P. atrosepticum 
(Figure 4B). The ahgnment was performed manually, 
based on the placement of base-pairing regions and 
lengths of the interspersing loops. The precise sequence 
corresponding to one antitoxic RNA repeat is not 
known for the new homologues, so each consensus 
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Figure 3. Phylogeny of selected toxin sequences. Sixty-nine toxin sequences from loci unambiguously containing all features of a Type III systein 
(presence of putative antitoxic repeats, promoter, terminator) were aligned, together with five Type II toxins, and then analysed with TREEFINDER 
(29). In the case where a certain species has more than one Type III TA system selected in this manner, the number following the underscore 
(e.g. Lgo_14) refers to the reference number for that system (Table 1 and Supplementary Table SI). A 'P' in parentheses implies that the source TA 
system is encoded upon a plasmid, rather than within the chromosome. Entries from the toxIN family are coloured green, cptIN are blue and tenpIN 
are red, while Type II toxins are in black. The scale bar represents the approximate number of changes per amino acid position as the tree expands 
radially. 



DNA repeat sequence was offset to maximize the align- 
ment to Toxl from P. atrosepticum. In this way, 39 unique 
Toxl sequences, corresponding to entries for 52 toxIN 
loci, could be ahgned to the Toxl structural template. 
Examination of the ahgnment (Figure 4B) shows that 
the pseudoknot structure is predicted to be conserved 
within this family; each Toxl contains two nested 
base-pairing regions, typically 3-4 nt in length. The 
spacing of these elements is also conserved, with a 
general pattern of a medium-length (3-4 nt) 'Loop la- 
2a', a short (1-2 nt) 'Loop 2a-lb', and a longer, variable 
length 'Loop lb-2b' separating the base pairing sequences 
(Figure 4A and B). A short Loop 2a- lb is a common 
feature in RNA pseudoknots because it aUows coaxial 
stacking of the two stem-loops in order to form a 
compact helical core (37). The length of Loop lb-2b was 
used to divide the Toxl homologues into two groups; this 



was defined as <7nt for Group 1 and >7nt for Group II, 
with the exception of entry 70 from Clostridium sp. HGF2, 
which contains a 12 nt hairpin insertion in this loop. It was 
not possible to determine whether the triplex base inter- 
actions in the Toxl P. atrosepticum pseudoknot are 
conserved due to the sequence variability of the loop 
regions, however, over half of the ahgned Toxl sequences 
retained the interdigitated guanine preceding stem 
region 2b. 

Nine of the Toxl antitoxins could not be aligned 
because the spacing between their predicted pseudoknot 
base-pairing regions did not match the overall pattern in 
the alignment. These putative Toxl sequences may have 
different pseudoknot structures, or they may not encode 
functional antitoxins. Another three antitoxins could not 
be aligned because they are ~60 nt in length, in contrast to 
the 31^6nt of the ahgned Toxl sequences. 
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Figure 4. Alignment of putative pseudoknot elements with toxIN antitoxins. (A). Structure of a single pseudoknot repeat of P. alrosepticum Toxl 
(left, PDB: 2XDD), and schematic of secondary and tertiary interactions within each Toxl unit (right). (B) Alignment of consensus repeat sequences 
of the ToxIN family antitoxins. Stem loops 1 and 2 are shown in red and teal, respectively; additional potential base pairing regions are underlined. 
The intercalated G19 of Toxl is shown in purple. The reference numbers in brackets indicate entries with identical Toxl consensus repeat sequences, 
as hsted in Supplementary Table SI. Entries 10, 11 and 30 could not be aligned because of overall length (~60nt). Entries 15 (69), 17 (39), 55, 58, 67, 
118, 120 (121), 123 and 125 all contained two nested base pairing regions of >3nt each, but could not be ahgned, because the loop lengths between 
base pairing regions did not match the pattern of either group I or II. Strain abbreviations can be related back to entries in Table 1 and 
Supplementary Table SI. 
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The cptIN and tenpIN antitoxin sequences are generally 
longer than the Toxl sequences (Table 2). The CptI and 
Tenpl antitoxins could not be readily aligned with the 
structural template of Toxl due to their length and 
sequence divergence from the canonical Toxl antitoxin 
family. Alignment within the CptI and Tenpl families 
was also not possible as these longer sequences generally 
contained multiple possible pseudoknot base-pairing 
regions, and because the offset of the processed antitoxic 
RNA relative to the DNA repeat is not known for any 
members of these famihes. 

Functional analysis of putative Type III 
toxin-antitoxin loci 

To determine whether the putative systems function as 
Type III TA loci, we assessed the toxicity of the 
proposed toxin gene and the ability of the cognate anti- 
toxic repeats to inhibit the lethal effects. The toxin genes 
from the tenpIN family P. luminescens TTOl locus, and 
cptIN family C. catus GD/7, R. torques L2-14 and 
E. rectale DSM 17629 loci were cloned under the 
control of the L-arabinose inducible promoter in 
pBAD30 (33), including the native ribosome binding 
site. The respective antitoxic repeats, either as single 
repeats or as the full tandem array, were cloned into a 
spectinomycin-resistant derivative of pQE-80L, pTAlOO 
(9), where they could be over-expressed by addition of 
IPTG. By co-transforming E. coli DH5a with both toxin 
and antitoxin plasmids, we were able to assess the toxicity 
and antitoxicity of each component (Figure 5). In the case 
of P. luminescens, a single Tenpl repeat did not provide 
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Figure 5. Protection of E. coli DH5a from Type III toxins by cognate 
antitoxins. Protection assays were performed as described in Materials 
and Methods. Results for the loxIN system of F. atrosepticum have 
been published previously (9); data from a single toxIN experiment is 
included for illustrative purposes. Of the four new loci tested, all toxin 
genes reduced viability of the host E. coli, which could then be restored 
by the full cognate antitoxin. Data shown are the mean values from 
triplicate experiments, with standard deviations represented by error 
bars. 



antitoxicity, over and above the empty vector 
control (Figure 5). However, the full Tenpl locus of 
P. luminescens, acted to inhibit the cognate toxin 
(Figure 5). This implies that the active pseudoknot from 
this locus is encoded over the DNA repeat boundary, so at 
least two DNA repeats are required to form at least one 
active inhibitory pseudoknot. Following the results from 
P. luminescens, the full antitoxins were cloned from 
C. catus, R. torques and E. rectale. These loci acted as 
expected, with the toxin inhibited by the cognate antitoxic 
RNA. The cloned antitoxin sequences do not encode any 
predicted, translatable, open reading frames, supporting 
the model that the repeats encode antitoxic pseudoknots 
of RNA, rather than any antitoxic peptide. 

ToxIN was first identified as an abortive infection 
system and is known to be active in multiple host back- 
grounds, providing protection against a range of bacterio- 
phages (9,21). To assess whether the newly identified Type 
III TA famihes might also cause abortive infection, 
attempts were made to clone the full tenpIN locus of 
P. luminescens and the cptIN loci of C. catus, E. rectale 
and R. torques into pBR322. No recombinants were 
obtained from the C. catus cloning, and while the other 
three plasmids were made successfully, on a qualitative 
level the strain containing pTRB265, encoding the cptIN 
locus from R. torques, formed smaller colonies than the 
other recombinant strains. 

Strains of E. coli DH5a containing the test plasmids, 
alongside a vector control, were infected with new envir- 
onmental coliphages. These coliphages were isolated from 
treated sewage effluent taken on three independent visits 
to a sewage treatment plant. FoUowing isolation, individ- 
ual phages were re-tested against all the cloned Type 111 
TA systems, in order to obtain Efficiency of Plating (EOF) 
data, as a measure of phage resistance (Table 3). As 
expected, the toxIN system from P. atrosepticum dramat- 
ically reduced the EOF of three phages (Table 3). The 
tenpIN locus from P. luminescens also reduced the EOF 
of the same three phages. The cptIN loci from E. rectale 
and R. torques did not affect the EOPs of any of the six 
phages tested. This confirms that at least one of the two, 
new, tenpIN and cptIN Type 111 TA families has abortive 
infection capacity and can provide high levels of resistance 
to bacteriophage infection. 



DISCUSSION 

Previous global studies have used and developed auto- 
mated methods to consolidate and extend lists of Type 1 
and Type 11 TA loci, while also validating selected new 
entries (5,12,16). Type III TA loci are a recent discovery 
and to date only a limited list, containing 19 homologues 
of one family, has been pubhshed (9). The aim of the 
current study was to search for, catalogue, define and 
characterize new Type 111 systems and families. 

Identification of Type III toxin-antitoxin systems 

Due to the complexity and specific features required of the 
nucleotide sequences defining a Type 111 TA locus, there 
was no readily available method for fully automated 
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Table 3. Bacteriophage resistance provided by Type III toxin-antitoxin systems 
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screening of the collected sequence databases. Having 
recently solved the macromolecular structure of ToxIN 
by crystallography (4), it became possible to begin 
screening for Type III TA systems using the ToxN struc- 
ture for structure-based homology searches. Analysis of 
the resulting structure-based search hits identified 37 
putative Type III TA systems, which were divided into 
three families. Exhaustive sequence homology searches 
then further increased the numbers included within each 
of these families, producing a final hst of 125 putative 
Type III TA loci. This marks the first identification of 
Type III TA loci encoding toxins bearing no significant 
amino acid sequence similarity with the initial Type III TA 
locus toxin, ToxN. This confirms the prediction that the 
'Type III' descriptor extends to many more families and is 
not an isolated case for ToxN and homologues thereof. 
This hst also greatly extends the known and predicted 
homologues within the toxIN family. As the results of 
this study were driven from an initial structure-based 
homology search, there is clear bias towards a subset of 
database entries. It is highly hkely that there are many 
more Type III TA famihes that remain to be identified. 
An automated system, combining searches for tandem 
repeats or pseudoknot sequences coupled to pahndromic 
repeats and an ORF, would be a very powerful tool to 
further expand our understanding of the numbers and 
spread of Type III TA loci. 

Distribution and abundance of Type III toxin-antitoxin 
systems 

The Type III families identified in this study are domina- 
ted by entries from the Firmicutes and Fusobacteria 
(Figure 2). In the cases of toxIN and tenpIN, this also 
extends, in part, to the Proteobacteria. The distribution 
of Type III systems within each of these Phyla does not 
appear biased by the relative levels of sequenced strains 
from each Phylum; as of September 2011, considering the 
total number of bacterial genome projects in the NCBI, 
28% of entries were for the Firmicutes, 44% were for 
the Proteobacteria and <1% were for the Fusobacteria. 
Both Type I and Type II TA systems also appear over- 
represented within the Firmicutes and Proteobacteria 
(5,12). 

Type III loci were identified in bacteria with a wide- 
range of lifestyles (Table 1 and Supplementary Table 
SI). The most pertinent category may be that of the 



human pathogens. Bacillus cereus Rockl-15 can cause 
food poisoning, while several clinical isolates were also 
identified; Yersinia pseudotuberculosis IP 31758, the two 
Vibrio strains and the Staphylococcus aureus strains. 
Perhaps of greatest chiiical relevance is the presence of 
tenpIN loci on coiijugative multi-resistance plasmids 
from S. aureus. Though only plasmid pVRSA (38) is 
hsted in Table 1, many other 5'. aureus plasmids contain 
these systems, including pGOl, which is associated 
with aminoglycoside resistance (39) and the related 
plasmid pSK41, a member of the P-lactamase-heavy- 
metal-resistance plasmid family (40). 

It is of note that 70 of the 125 hits come from strains 
that have been sequenced as part of the Human 
Microbiome Project (HMP) (http://www.hmpdacc.org/) 
(41). A further eight hits are from strains in the 
metaHIT project (http://www.metahit.eu/). While 
metaHIT focuses solely on the Human Intestinal Tract, 
the HMP samples from many sites around the healthy 
human body. As of October 201 1, the HMP had deposited 
~800 of the ~12 500 bacterial genome project sequences in 
the NCBI database. As this small proportion of HMP 
sequences, ~6% of the database, contains over half of 
the putative Type III TA systems, it appears these loci 
are well represented within human commensals. 

The relative numbers of Type III loci present within a 
strain (maximum of six thus far) are currently low in com- 
parison to the large numbers of Type II systems (up to 97) 
or medium numbers of Type I systems (up to 26) identified 
in other strains (5,12). These values for Type I and Type II 
systems have been steadily increasing since initial discov- 
ery of TA systems; further global searches for Type III 
systems will probably increase these numbers in a similar 
fashion. 

Functional roles of Type III toxin-antitoxin systems 

Using co-over-expression assays, members of both the 
cptIN and tenpIN families were confirmed as active TA 
systems, in an E. coli model (Figure 5). While ToxIN 
from P. atrosepticum also acts as an abortive infection 
system in the endogenous and other host enteric strains 
(9,21), and it was shown that TenpIN from P. luminescens 
can abort infection by coliphages in E. coli (Table 3), it 
cannot be concluded that this is the natural 'role' of Type 
III TA loci. Furthermore, the cptIN loci tested were not 
able to provide phage resistance against this small subset 
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of phages and in an E. coli model. It is of note that both 
the toxIN and tenpIN systems were identified in enteric 
bacteria, so phage resistance may be a specific attribute 
of Type III TA systems from this host taxon. We also used 
an enteric model of phage infection; the cptIN loci came 
from Firmicutes, so the coliphages used may be too dis- 
tantly related from the phages targeting E. rectale and 
R. torques to be recognized and aborted, or the cptIN 
systems may not be correctly expressed in E. coli. 
Transferred back to their natural hosts, these cptIN loci 
may be active against cognate phages, though they may 
also fulfil an entirely different requirement. Further inves- 
tigations are required to identify cellular roles for cptlN. 
The original biological role of Type III TA systems is 
hkely to remain under debate. It will be interesting to 
study the activities of other Type III TA systems, both 
in model systems and within their endogenous hosts, to 
investigate their full physiological capabihties and evolu- 
tionary significance. 

CONCLUSION 

The numbers of known Type I and Type II TA systems 
has greatly increased as methods of identification have 
been streamhned and the sequence databases swell with 
entries. In comparison, though this study identifies new 
Type III TA systems for the first time, the numbers of 
families and examples within each family are compara- 
tively few in regard to the Type I and II systems. 
Performing further studies of the full sequence databases 
in an automated and unbiased fashion is highly likely to 
prove useful and instructive. It is predicted that there are 
many more undiscovered Type III TA systems ready to be 
examined. Doing so will provide greater understanding of 
their diversity, abundance and biological roles and will 
provide a useful range of molecular 'reagents' with 
which to study protein-RNA interactions. 
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